15 resultados para Bayesian, statistics, genetics, phenotype analysis, complex diseases, complex etiology, model comparison, latent class analysis, grade of membership, fuzzy clustering, item response theory, migraine, twin study, heritability, genome-wide linkage analysis

em Duke University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Alzheimer's disease is a complex and progressive neurodegenerative disease leading to loss of memory, cognitive impairment, and ultimately death. To date, six large-scale genome-wide association studies have been conducted to identify SNPs that influence disease predisposition. These studies have confirmed the well-known APOE epsilon4 risk allele, identified a novel variant that influences disease risk within the APOE epsilon4 population, found a SNP that modifies the age of disease onset, as well as reported the first sex-linked susceptibility variant. Here we report a genome-wide scan of Alzheimer's disease in a set of 331 cases and 368 controls, extending analyses for the first time to include assessments of copy number variation. In this analysis, no new SNPs show genome-wide significance. We also screened for effects of copy number variation, and while nothing was significant, a duplication in CHRNA7 appears interesting enough to warrant further investigation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Scheduling a set of jobs over a collection of machines to optimize a certain quality-of-service measure is one of the most important research topics in both computer science theory and practice. In this thesis, we design algorithms that optimize {\em flow-time} (or delay) of jobs for scheduling problems that arise in a wide range of applications. We consider the classical model of unrelated machine scheduling and resolve several long standing open problems; we introduce new models that capture the novel algorithmic challenges in scheduling jobs in data centers or large clusters; we study the effect of selfish behavior in distributed and decentralized environments; we design algorithms that strive to balance the energy consumption and performance.

The technically interesting aspect of our work is the surprising connections we establish between approximation and online algorithms, economics, game theory, and queuing theory. It is the interplay of ideas from these different areas that lies at the heart of most of the algorithms presented in this thesis.

The main contributions of the thesis can be placed in one of the following categories.

1. Classical Unrelated Machine Scheduling: We give the first polygorithmic approximation algorithms for minimizing the average flow-time and minimizing the maximum flow-time in the offline setting. In the online and non-clairvoyant setting, we design the first non-clairvoyant algorithm for minimizing the weighted flow-time in the resource augmentation model. Our work introduces iterated rounding technique for the offline flow-time optimization, and gives the first framework to analyze non-clairvoyant algorithms for unrelated machines.

2. Polytope Scheduling Problem: To capture the multidimensional nature of the scheduling problems that arise in practice, we introduce Polytope Scheduling Problem (\psp). The \psp problem generalizes almost all classical scheduling models, and also captures hitherto unstudied scheduling problems such as routing multi-commodity flows, routing multicast (video-on-demand) trees, and multi-dimensional resource allocation. We design several competitive algorithms for the \psp problem and its variants for the objectives of minimizing the flow-time and completion time. Our work establishes many interesting connections between scheduling and market equilibrium concepts, fairness and non-clairvoyant scheduling, and queuing theoretic notion of stability and resource augmentation analysis.

3. Energy Efficient Scheduling: We give the first non-clairvoyant algorithm for minimizing the total flow-time + energy in the online and resource augmentation model for the most general setting of unrelated machines.

4. Selfish Scheduling: We study the effect of selfish behavior in scheduling and routing problems. We define a fairness index for scheduling policies called {\em bounded stretch}, and show that for the objective of minimizing the average (weighted) completion time, policies with small stretch lead to equilibrium outcomes with small price of anarchy. Our work gives the first linear/ convex programming duality based framework to bound the price of anarchy for general equilibrium concepts such as coarse correlated equilibrium.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Surveys can collect important data that inform policy decisions and drive social science research. Large government surveys collect information from the U.S. population on a wide range of topics, including demographics, education, employment, and lifestyle. Analysis of survey data presents unique challenges. In particular, one needs to account for missing data, for complex sampling designs, and for measurement error. Conceptually, a survey organization could spend lots of resources getting high-quality responses from a simple random sample, resulting in survey data that are easy to analyze. However, this scenario often is not realistic. To address these practical issues, survey organizations can leverage the information available from other sources of data. For example, in longitudinal studies that suffer from attrition, they can use the information from refreshment samples to correct for potential attrition bias. They can use information from known marginal distributions or survey design to improve inferences. They can use information from gold standard sources to correct for measurement error.

This thesis presents novel approaches to combining information from multiple sources that address the three problems described above.

The first method addresses nonignorable unit nonresponse and attrition in a panel survey with a refreshment sample. Panel surveys typically suffer from attrition, which can lead to biased inference when basing analysis only on cases that complete all waves of the panel. Unfortunately, the panel data alone cannot inform the extent of the bias due to attrition, so analysts must make strong and untestable assumptions about the missing data mechanism. Many panel studies also include refreshment samples, which are data collected from a random sample of new

individuals during some later wave of the panel. Refreshment samples offer information that can be utilized to correct for biases induced by nonignorable attrition while reducing reliance on strong assumptions about the attrition process. To date, these bias correction methods have not dealt with two key practical issues in panel studies: unit nonresponse in the initial wave of the panel and in the

refreshment sample itself. As we illustrate, nonignorable unit nonresponse

can significantly compromise the analyst's ability to use the refreshment samples for attrition bias correction. Thus, it is crucial for analysts to assess how sensitive their inferences---corrected for panel attrition---are to different assumptions about the nature of the unit nonresponse. We present an approach that facilitates such sensitivity analyses, both for suspected nonignorable unit nonresponse

in the initial wave and in the refreshment sample. We illustrate the approach using simulation studies and an analysis of data from the 2007-2008 Associated Press/Yahoo News election panel study.

The second method incorporates informative prior beliefs about

marginal probabilities into Bayesian latent class models for categorical data.

The basic idea is to append synthetic observations to the original data such that

(i) the empirical distributions of the desired margins match those of the prior beliefs, and (ii) the values of the remaining variables are left missing. The degree of prior uncertainty is controlled by the number of augmented records. Posterior inferences can be obtained via typical MCMC algorithms for latent class models, tailored to deal efficiently with the missing values in the concatenated data.

We illustrate the approach using a variety of simulations based on data from the American Community Survey, including an example of how augmented records can be used to fit latent class models to data from stratified samples.

The third method leverages the information from a gold standard survey to model reporting error. Survey data are subject to reporting error when respondents misunderstand the question or accidentally select the wrong response. Sometimes survey respondents knowingly select the wrong response, for example, by reporting a higher level of education than they actually have attained. We present an approach that allows an analyst to model reporting error by incorporating information from a gold standard survey. The analyst can specify various reporting error models and assess how sensitive their conclusions are to different assumptions about the reporting error process. We illustrate the approach using simulations based on data from the 1993 National Survey of College Graduates. We use the method to impute error-corrected educational attainments in the 2010 American Community Survey using the 2010 National Survey of College Graduates as the gold standard survey.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This is a crucial transition time for human genetics in general, and for HIV host genetics in particular. After years of equivocal results from candidate gene analyses, several genome-wide association studies have been published that looked at plasma viral load or disease progression. Results from other studies that used various large-scale approaches (siRNA screens, transcriptome or proteome analysis, comparative genomics) have also shed new light on retroviral pathogenesis. However, most of the inter-individual variability in response to HIV-1 infection remains to be explained: genome resequencing and systems biology approaches are now required to progress toward a better understanding of the complex interactions between HIV-1 and its human host.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Meta-analyses of genome-wide association studies (GWAS) have demonstrated that the same genetic variants can be associated with multiple diseases and other complex traits. We present software called CPAG (Cross-Phenotype Analysis of GWAS) to look for similarities between 700 traits, build trees with informative clusters, and highlight underlying pathways. Clusters are consistent with pre-defined groups and literature-based validation but also reveal novel connections. We report similarity between plasma palmitoleic acid and Crohn's disease and find that specific fatty acids exacerbate enterocolitis in zebrafish. CPAG will become increasingly powerful as more genetic variants are uncovered, leading to a deeper understanding of complex traits. CPAG is freely available at www.sourceforge.net/projects/CPAG/.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Trehalose is a non-reducing disaccharide essential for pathogenic fungal survival and virulence. The biosynthesis of trehalose requires the trehalose-6-phosphate synthase, Tps1, and trehalose-6-phosphate phosphatase, Tps2. More importantly, the trehalose biosynthetic pathway is absent in mammals, conferring this pathway as an ideal target for antifungal drug design. However, lack of germane biochemical and structural information hinders antifungal drug design against these targets.

In this dissertation, macromolecular X-ray crystallography and biochemical assays were employed to understand the structures and functions of proteins involved in the trehalose biosynthetic pathway. I report here the first eukaryotic Tps1 structures from Candida albicans (C. albicans) and Aspergillus fumigatus (A. fumigatus) with substrates or substrate analogs. These structures reveal the key residues involved in substrate binding and catalysis. Subsequent enzymatic assays and cellular assays highlight the significance of these key Tps1 residues in enzyme function and fungal stress response. The Tps1 structure captured in its transition-state with a non-hydrolysable inhibitor demonstrates that Tps1 adopts an “internal return like” mechanism for catalysis. Furthermore, disruption of the trehalose biosynthetic complex formation through abolishing Tps1 dimerization reveals that complex formation has regulatory function in addition to trehalose production, providing additional targets for antifungal drug intervention.

I also present here the structure of the Tps2 N-terminal domain (Tps2NTD) from C. albicans, which may be involved in the proper formation of the trehalose biosynthetic complex. Deletion of the Tps2NTD results in a temperature sensitive phenotype. Further, I describe in this dissertation the structures of the Tps2 phosphatase domain (Tps2PD) from C. albicans, A. fumigatus and Cryptococcus neoformans (C. neoformans) in multiple conformational states. The structures of the C. albicans Tps2PD -BeF3-trehalose complex and C. neoformans Tps2PD(D24N)-T6P complex reveal extensive interactions between both glucose moieties of the trehalose involving all eight hydroxyl groups and multiple residues of both the cap and core domains of Tps2PD. These structures also reveal that steric hindrance is a key underlying factor for the exquisite substrate specificity of Tps2PD. In addition, the structures of Tps2PD in the open conformation provide direct visualization of the conformational changes of this domain that are effected by substrate binding and product release.

Last, I present the structure of the C. albicans trehalose synthase regulatory protein (Tps3) pseudo-phosphatase domain (Tps3PPD) structure. Tps3PPD adopts a haloacid dehydrogenase superfamily (HADSF) phosphatase fold with a core Rossmann-fold domain and a α/β fold cap domain. Despite lack of phosphatase activity, the cleft between the Tps3PPD core domain and cap domain presents a binding pocket for a yet uncharacterized ligand. Identification of this ligand could reveal the cellular function of Tps3 and any interconnection of the trehalose biosynthetic pathway with other cellular metabolic pathways.

Combined, these structures together with significant biochemical analyses advance our understanding of the proteins responsible for trehalose biosynthesis. These structures are ready to be exploited to rationally design or optimize inhibitors of the trehalose biosynthetic pathway enzymes. Hence, the work described in this thesis has laid the groundwork for the design of Tps1 and Tps2 specific inhibitors, which ultimately could lead to novel therapeutics to treat fungal infections.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.

Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A class of multi-process models is developed for collections of time indexed count data. Autocorrelation in counts is achieved with dynamic models for the natural parameter of the binomial distribution. In addition to modeling binomial time series, the framework includes dynamic models for multinomial and Poisson time series. Markov chain Monte Carlo (MCMC) and Po ́lya-Gamma data augmentation (Polson et al., 2013) are critical for fitting multi-process models of counts. To facilitate computation when the counts are high, a Gaussian approximation to the P ́olya- Gamma random variable is developed.

Three applied analyses are presented to explore the utility and versatility of the framework. The first analysis develops a model for complex dynamic behavior of themes in collections of text documents. Documents are modeled as a “bag of words”, and the multinomial distribution is used to characterize uncertainty in the vocabulary terms appearing in each document. State-space models for the natural parameters of the multinomial distribution induce autocorrelation in themes and their proportional representation in the corpus over time.

The second analysis develops a dynamic mixed membership model for Poisson counts. The model is applied to a collection of time series which record neuron level firing patterns in rhesus monkeys. The monkey is exposed to two sounds simultaneously, and Gaussian processes are used to smoothly model the time-varying rate at which the neuron’s firing pattern fluctuates between features associated with each sound in isolation.

The third analysis presents a switching dynamic generalized linear model for the time-varying home run totals of professional baseball players. The model endows each player with an age specific latent natural ability class and a performance enhancing drug (PED) use indicator. As players age, they randomly transition through a sequence of ability classes in a manner consistent with traditional aging patterns. When the performance of the player significantly deviates from the expected aging pattern, he is identified as a player whose performance is consistent with PED use.

All three models provide a mechanism for sharing information across related series locally in time. The models are fit with variations on the P ́olya-Gamma Gibbs sampler, MCMC convergence diagnostics are developed, and reproducible inference is emphasized throughout the dissertation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The advances in three related areas of state-space modeling, sequential Bayesian learning, and decision analysis are addressed, with the statistical challenges of scalability and associated dynamic sparsity. The key theme that ties the three areas is Bayesian model emulation: solving challenging analysis/computational problems using creative model emulators. This idea defines theoretical and applied advances in non-linear, non-Gaussian state-space modeling, dynamic sparsity, decision analysis and statistical computation, across linked contexts of multivariate time series and dynamic networks studies. Examples and applications in financial time series and portfolio analysis, macroeconomics and internet studies from computational advertising demonstrate the utility of the core methodological innovations.

Chapter 1 summarizes the three areas/problems and the key idea of emulating in those areas. Chapter 2 discusses the sequential analysis of latent threshold models with use of emulating models that allows for analytical filtering to enhance the efficiency of posterior sampling. Chapter 3 examines the emulator model in decision analysis, or the synthetic model, that is equivalent to the loss function in the original minimization problem, and shows its performance in the context of sequential portfolio optimization. Chapter 4 describes the method for modeling the steaming data of counts observed on a large network that relies on emulating the whole, dependent network model by independent, conjugate sub-models customized to each set of flow. Chapter 5 reviews those advances and makes the concluding remarks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Vertebrate eggs are arrested at Metaphase II by Emi2, the meiotic anaphase-promoting complex/cyclosome (APC/C) inhibitor. Although the importance of Emi2 during oocyte maturation has been widely recognized and its regulation extensively studied, its mechanism of action remained elusive. Many APC/C inhibitors have been reported to act as pseudosubstrates, inhibiting the APC/C by preventing substrate binding. Here we show that a previously identified zinc-binding region is critical for the function of Emi2, whereas the D-box is largely dispensable. We further demonstrate that instead of acting through a "pseudosubstrate" mechanism as previously hypothesized, Emi2 can inhibit Cdc20-dependent activation of the APC/C substoichiometrically, blocking ubiquitin transfer from the ubiquitin-charged E2 to the substrate. These findings provide a novel mechanism of APC/C inhibition wherein the final step of ubiquitin transfer is targeted and raise the interesting possibility that APC/C is inhibited by Emi2 in a catalytic manner.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The autosomal recessive kidney disease nephronophthisis (NPHP) constitutes the most frequent genetic cause of terminal renal failure in the first 3 decades of life. Ten causative genes (NPHP1-NPHP9 and NPHP11), whose products localize to the primary cilia-centrosome complex, support the unifying concept that cystic kidney diseases are "ciliopathies". Using genome-wide homozygosity mapping, we report here what we believe to be a new locus (NPHP-like 1 [NPHPL1]) for an NPHP-like nephropathy. In 2 families with an NPHP-like phenotype, we detected homozygous frameshift and splice-site mutations, respectively, in the X-prolyl aminopeptidase 3 (XPNPEP3) gene. In contrast to all known NPHP proteins, XPNPEP3 localizes to mitochondria of renal cells. However, in vivo analyses also revealed a likely cilia-related function; suppression of zebrafish xpnpep3 phenocopied the developmental phenotypes of ciliopathy morphants, and this effect was rescued by human XPNPEP3 that was devoid of a mitochondrial localization signal. Consistent with a role for XPNPEP3 in ciliary function, several ciliary cystogenic proteins were found to be XPNPEP3 substrates, for which resistance to N-terminal proline cleavage resulted in attenuated protein function in vivo in zebrafish. Our data highlight an emerging link between mitochondria and ciliary dysfunction, and suggest that further understanding the enzymatic activity and substrates of XPNPEP3 will illuminate novel cystogenic pathways.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To extend the understanding of host genetic determinants of HIV-1 control, we performed a genome-wide association study in a cohort of 2,554 infected Caucasian subjects. The study was powered to detect common genetic variants explaining down to 1.3% of the variability in viral load at set point. We provide overwhelming confirmation of three associations previously reported in a genome-wide study and show further independent effects of both common and rare variants in the Major Histocompatibility Complex region (MHC). We also examined the polymorphisms reported in previous candidate gene studies and fail to support a role for any variant outside of the MHC or the chemokine receptor cluster on chromosome 3. In addition, we evaluated functional variants, copy-number polymorphisms, epistatic interactions, and biological pathways. This study thus represents a comprehensive assessment of common human genetic variation in HIV-1 control in Caucasians.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Although lactic acidosis is a prominent feature of solid tumors, we still have limited understanding of the mechanisms by which lactic acidosis influences metabolic phenotypes of cancer cells. We compared global transcriptional responses of breast cancer cells in response to three distinct tumor microenvironmental stresses: lactic acidosis, glucose deprivation, and hypoxia. We found that lactic acidosis and glucose deprivation trigger highly similar transcriptional responses, each inducing features of starvation response. In contrast to their comparable effects on gene expression, lactic acidosis and glucose deprivation have opposing effects on glucose uptake. This divergence of metabolic responses in the context of highly similar transcriptional responses allows the identification of a small subset of genes that are regulated in opposite directions by these two conditions. Among these selected genes, TXNIP and its paralogue ARRDC4 are both induced under lactic acidosis and repressed with glucose deprivation. This induction of TXNIP under lactic acidosis is caused by the activation of the glucose-sensing helix-loop-helix transcriptional complex MondoA:Mlx, which is usually triggered upon glucose exposure. Therefore, the upregulation of TXNIP significantly contributes to inhibition of tumor glycolytic phenotypes under lactic acidosis. Expression levels of TXNIP and ARRDC4 in human cancers are also highly correlated with predicted lactic acidosis pathway activities and associated with favorable clinical outcomes. Lactic acidosis triggers features of starvation response while activating the glucose-sensing MondoA-TXNIP pathways and contributing to the "anti-Warburg" metabolic effects and anti-tumor properties of cancer cells. These results stem from integrative analysis of transcriptome and metabolic response data under various tumor microenvironmental stresses and open new paths to explore how these stresses influence phenotypic and metabolic adaptations in human cancers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Genetic association studies are conducted to discover genetic loci that contribute to an inherited trait, identify the variants behind these associations and ascertain their functional role in determining the phenotype. To date, functional annotations of the genetic variants have rarely played more than an indirect role in assessing evidence for association. Here, we demonstrate how these data can be systematically integrated into an association study's analysis plan. RESULTS: We developed a Bayesian statistical model for the prior probability of phenotype-genotype association that incorporates data from past association studies and publicly available functional annotation data regarding the susceptibility variants under study. The model takes the form of a binary regression of association status on a set of annotation variables whose coefficients were estimated through an analysis of associated SNPs in the GWAS Catalog (GC). The functional predictors examined included measures that have been demonstrated to correlate with the association status of SNPs in the GC and some whose utility in this regard is speculative: summaries of the UCSC Human Genome Browser ENCODE super-track data, dbSNP function class, sequence conservation summaries, proximity to genomic variants in the Database of Genomic Variants and known regulatory elements in the Open Regulatory Annotation database, PolyPhen-2 probabilities and RegulomeDB categories. Because we expected that only a fraction of the annotations would contribute to predicting association, we employed a penalized likelihood method to reduce the impact of non-informative predictors and evaluated the model's ability to predict GC SNPs not used to construct the model. We show that the functional data alone are predictive of a SNP's presence in the GC. Further, using data from a genome-wide study of ovarian cancer, we demonstrate that their use as prior data when testing for association is practical at the genome-wide scale and improves power to detect associations. CONCLUSIONS: We show how diverse functional annotations can be efficiently combined to create 'functional signatures' that predict the a priori odds of a variant's association to a trait and how these signatures can be integrated into a standard genome-wide-scale association analysis, resulting in improved power to detect truly associated variants.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dopamine is an important central nervous system transmitter that functions through two classes of receptors (D1 and D2) to influence a diverse range of biological processes in vertebrates. With roles in regulating neural activity, behavior, and gene expression, there has been great interest in understanding the function and evolution dopamine and its receptors. In this study, we use a combination of sequence analyses, microsynteny analyses, and phylogenetic relationships to identify and characterize both the D1 (DRD1A, DRD1B, DRD1C, and DRD1E) and D2 (DRD2, DRD3, and DRD4) dopamine receptor gene families in 43 recently sequenced bird genomes representing the major ordinal lineages across the avian family tree. We show that the common ancestor of all birds possessed at least seven D1 and D2 receptors, followed by subsequent independent losses in some lineages of modern birds. Through comparisons with other vertebrate and invertebrate species we show that two of the D1 receptors, DRD1A and DRD1B, and two of the D2 receptors, DRD2 and DRD3, originated from a whole genome duplication event early in the vertebrate lineage, providing the first conclusive evidence of the origin of these highly conserved receptors. Our findings provide insight into the evolutionary development of an important modulatory component of the central nervous system in vertebrates, and will help further unravel the complex evolutionary and functional relationships among dopamine receptors.